### Loading libraries
library(gapminder)
library(tidyverse)
library(dplyr)
library(forcats)
library(ggplot2)
library(plotly)
Task: Choose one dataset (of your choice) and a variable to explore. After ensuring the variable(s) you’re exploring are indeed factors, you should:
Drop factor / levels; Reorder levels based on knowledge from data.
Explore the effects of re-leveling a factor in a tibble by:
comparing the results of arrange on the original and re-leveled factor. Plotting a figure of before/after re-leveling the factor (make sure to assign the factor to an aesthetic of your choosing). These explorations should involve the data, the factor levels, and at least two figures (before and after.
First, let’s check if the variable of choice is a factor or not. #### Checking class
class(gapminder$country) #confirms that the country variable is a factor
## [1] "factor"
#levels(gapminder$country) : to see all the levels in the factor-country
nlevels(gapminder$country) # shows the number of countries in the dataset
## [1] 142
Now, let’s select all the countries in the continent of Asia and see how drop function can be useful. The former set of code evaluates without using drop function. When calculating number of countries in Asia and Africa we get the number of total countries in the database ie 142. The latter code uses drop function and gives the actual number of countries in Asia and Africa ie 85. This showcases the importance of using drop function while working with factors.
#Let's filter all the countries in the continent of Asia
Asia_country<- gapminder %>%
filter(continent == c("Asia"))
nlevels(Asia_country$country) # counting number of countries in Asia without using the drp
## [1] 142
Asia_country_drop<- Asia_country %>%
droplevels()
nlevels(Asia_country_drop$country)
## [1] 33
To show how reordering can be helpful, let’s consider one particular year. For this purpose, 2007 has been chosen and the corresponding lollipop plot has been plotted.
q<-Asia_country %>%
filter(year=="2007") %>%
ggplot(aes(x = country, y = lifeExp))+
geom_segment(aes(x=country, xend=country, y=0, yend=lifeExp), color="skyblue")+
geom_point(color="blue", size=3, alpha=0.5)+
labs(
x = "Country",
y = "Life Expectancy")+
theme_bw()+
coord_flip()
q %>%
ggplotly()
p<-Asia_country %>%
filter(year=="2007") %>%
ggplot(aes(x = fct_reorder(country, lifeExp), y = lifeExp))+
geom_segment(aes(x = fct_reorder(country,lifeExp), xend = fct_reorder(country, lifeExp), y=0, yend=lifeExp), color="skyblue")+
geom_point(color="blue", size=3, alpha=0.5)+
theme_bw()+
labs(
x = "Country",
y = "Life Expectancy")+
coord_flip()
p %>%
ggplotly()